ESP game

The ESP Game is an idea in computer science for addressing the problem of creating difficult metadata. The idea behind the game is to use the computational power of humans to perform a task that computers cannot yet do (originally, image recognition) by packaging the task as a game. It was originally conceived by Luis von Ahn of Carnegie Mellon University. Google bought a licence to create its own version of the game in 2006 in order to return better search results for its online images.[1] The licence of the data acquired by Ahn's ESP Game, or the Google version, is not clear. Google's version was shut down on September 16, 2011 as part of the Google Labs closure in September 2011.

Contents

Idea

Image recognition is currently (as of 2009) a task that computers are almost incapable of. Humans are perfectly capable of it, but not necessarily willing.

The applications and uses of having so many labeled images are significant; for example, more accurate image searching and accessibility for visually impaired users (by reading out an image's labels).

The idea of partnering two people to label images ensures that entered words will be accurate. Since the only thing the two partners have in common is that they both see the same image, they must enter reasonable labels to have any chance of agreeing on one.

The ESP Game as it is currently implemented encourages players to assign “obvious” labels, which are most likely to lead to an agreement with the partner. But these labels can often be deduced from the labels already present using an appropriate language model and such labels therefore add only little information to the system. A Microsoft research project assigns probabilities to the next label to be added. This model is then used in a program, which plays the ESP game without looking at the image.

Rules of the game

Once logged in, a user is automatically matched with a random partner. The partners do not know each other's identity and they cannot communicate. Once matched, they will both be shown the same image. Their task is to agree on a word that would be an appropriate label for the image. They both enter possible words, and once a word is entered by both partners (not necessarily at the same time), that word is agreed upon, and that word becomes a label for the image. Once they agree on a word, they are shown another image. They have two and a half minutes to label 15 images.

Both partners have the option to pass; that is, give up on an image. Once one partner passes, the other partner is shown a message that their partner wishes to pass. Both partners must pass for a new image to be shown.

Some images have “taboo”words; that is, words that cannot be entered as possible labels. These words will usually be related to the image and make the game harder because they can be words that players commonly use as guesses. Taboo words are obtained from the game itself. The first time an image is used in the game, it will have no taboo words. If the image is ever used again, it will have one taboo word: the word that resulted from the previous agreement. The next time the image is used, it will have two taboo words, and so on. “Taboo” words is done automatically by the system: once an image has been labeled enough times with the same word, that word becomes taboo so that the image will get a variety of different words as labels.

Occasionally, the game will be played solo, without a human partner, with the ESP Game itself acting as the opponent and delivering a series of pre-determined labels to the single human player (which have been harvested from labels given to the image during the course of earlier games played by real humans). This is necessary if there are an odd number of people playing the game.[2]

In late 2008, the game was rebranded under the gwap.com domain (for game with a purpose), with a new user interface. Some other games that were also created by Luis von Ahn, such as “Peekaboom” and “Phetch”, were discontinued at that point.

Cheating

Ahn has described countermeasures which prevent players from "cheating" the game, and introducing false data into the system. By giving players occasional test images for which common labels are known, it is possible to check that players are answering honestly, and a player's guesses are only stored if they successfully label the test images.[3]

Furthermore, a label is only stored after a certain number of players (N) have agreed on it. At this point, all of the taboo lists for the images are deleted and the image is returned to the game pool as if it were a fresh image. If X is probability of a label being incorrect despite a player having successfully labelled test images, then after N repetitions the probability of corruption is X^N, assuming that end repetitions are independent of each other.[3]

Selecting the Images

The choice of images used by the ESP game makes a difference in the player’s experience.The game could perhaps be less entertaining if all the images were chosen from a single site and were all extremely similar.

The most basic strategy for picking the images is to select them at random from the Web using a small amount of filtering. This is the strategy employed in the current implementation of the game, except for two minor differences. First, once an image is randomly chosen from the Web, we reintroduce it into the game several times until it is fully labeled. Second, rather than picking the images from the Web in an online fashion, we collected 350,000 images in advance and are waiting until those are fully labeled to start with the whole Web.[4]

The images were chosen using "Random Bounce Me", a website that selects a page at random from the Google database.[5] “Random Bounce Me” was queried repeatedly, each time collecting all JPEG and GIF images in the random page, except for images that did not fit our criteria: blank images, images that consist of a single color, images that are smaller than 20 pixels on either dimension, and images with an aspect ratio greater than 4.5 or smaller than 1/4.5. This process was repeated until 350,000 images were collected. The images were then rescaled to fit the game applet. For each session of the game, we choose 15 different images from our set of 350,000.

Evaluation

In general it is difficult to predict if a game will become popular. One approach, which we followed early on, is to ask participants a series of questions regarding how much they enjoyed playing the game. Our data were extremely positive.

Another approach is: we present usage statistics from arbitrary people playing our game online. We also present evidence that the labels produced using the game are indeed useful descriptions of the images. It’s not the case that players must input words describing the images: players are never asked to describe anything. We show, however, that players do input words describing the images. To do so, we present the results of searching for randomly chosen keywords and show that the proportion of appropriate images when searching using the labels generated by the game is extremely high. In addition, we present the results of a study that compares the labels generated using the game to labels generated by participants that were asked to describe the images .

References

  1. ^ "Solving the web's image problem". bbc. 2008-05-14. http://news.bbc.co.uk/1/hi/technology/7395751.stm. Retrieved 2008-12-14. 
  2. ^ Google Tech Talk on Human Computation by creator Luis von Ahn, http://video.google.com/videoplay?docid=-8246463980976635143&q=google+tech+talks, retrieved 2009-10-24 
  3. ^ a b Google Tech Talk on human computation by Luis von Ahn
  4. ^ Luis von Ahn. "Human Computation". 2005
  5. ^ "Google Web Search". http://www.google.com. 

External links